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Abstract 

We consider least squares semidefinite programming (LSSDP) where the primal matrix 
variable must satisfy given linear equality and inequality constraints, and must also he in the 
intersection of the cone of symmetric positive semidefinite matrices and a simple polyhedral 
set. We propose an inexact accelerated block coordinate descent (ABCD) method for solving 
LSSDP via its dual, which can be reformulated as a convex composite minimization problem 
whose objective is the sum of a coupled quadratic function involving four blocks of variables 
and two separable non-smooth functions involving only the first and second block, respectively. 

Our inexact ABCD method has the attractive 0{l/k^) iteration complexity if the subproblems 
are solved progressively more accurately. The design of our ABCD method relies on recent 
advances in the symmetric Gauss-Seidel technique for solving a convex minimization problem 
whose objective is the sum of a multi-block quadratic function and a non-smooth function 
involving only the first block. Extensive numerical experiments on various classes of over 
600 large scale LSSDP problems demonstrate that our proposed ABCD method not only 
can solve the problems to high accuracy, but it is also far more efficient than (a) the well 
known BCD (block coordinate descent) method, (b) the eARBCG (an enhanced version of 
the accelerated randomized block coordinate gradient) method, and (c) the APG (accelerated 
proximal gradient) method. 

Keywords: Least squares SDP, accelerated block coordinate descent, symmetric Gauss-Seidel. 
AMS subject classifications. 90C06, 90C22, 90C25, 65F10. 

1 Introduction 

Let be the space of n x n real symmetric matrices endowed with the standard trace inner 
product (•, •) and its induced norm || • ||. We denote by 5” the cone of positive semidefinite 
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matrices in S'^. For any matrix X G 5”, we use X 0 {X ^ 0) to indicate that X is a symmetric 
positive definite (positive semidefinite) matrix. 

Consider the following semidefinite programming (SDP) problem: 


min (C, X) 

( 1 ) 

s.t. AE{X) = hE, AiX-s = d, X X 

where bE ^ 5?™'® and C £ S'^ are given data, Ae ■ X ^ and Ai : X ^ are two given 
linear maps, V and K, are two nonempty simple closed convex sets, e.g., V = {W G 5” : L <W < 
U} with L,U G cS"" being given matrices and X = {w £ : I < w < u} with l,u £ being 

given vectors. When applying a proximal point algorithm (PPA) |25l [26] to solve Q, we need 
to solve the following subproblem in each iteration for a given point {X^,s^) and a parameter 
o-fc > 0: 

(2) 

This motivated us to study the following least squares semidefinite programming (LSSDP) which 
includes (§ as a particular case: 

(P) min l\\X - Gf + l\\s - gf 

s.t. A_b(A) = bE, AiX — s = 0, X G 5”, X £V, s £ X, 

where G £ g £ 3?”*^ are given data. In order for the PPA to be efficient for solving 0 , it is 
of great importance for us to design an efficient algorithm to solve the above problem (P). Thus, 
the objective of this paper is to achieve this goal via solving the dual of (P). 

The dual of (P) is given by 




fc+i — argmin ■ 

X,s 


{G, X) + ^(||X-X^||2 + ||s-,^||2) 

I Ae{X) = bE, AiX -s = 0, X £Sl, X £V, s£X 


(D) m.inF{Z,v,S,yE,yi) := -{bE, yE) + bs^iS) + d^{-Z) + 6^{-v) 

+2 + A*jyi + S + Z + G|p 

+U9 + v-yjr-l\\Gr-l\\gf, 

where for any given set C, 6c{-) is the indicator function over C such that 6c{u) = 0 if u £ C and 
oo otherwise, and 6q{-) is the conjugate function of 6c defined by 


6c{-) = sup(-, W). 

w^c 


(3) 


Problem 

form: 


D) belongs to a general class of multi-block convex optimization problems of the 


min{'I'(a:) := 6{x) -|- C(3;)}, (4) 

where x = (xi,..., Xq) £ X := Ai x • • • x Xq, and each Xi is a finite dimensional real Euclidean 
space equipped with an inner product (•, •) and its induced norm || • ||. Here 6{x) = Yli=i 

X ^ TZ and 9i : Xi ^ (—oo,-|-oo], i = are proper, lower semi-continuous convex 

functions. We assume that C is continuously differentiable on an open neighborhood containing 
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dom(0) := {x ^ X : 6{x) < 00 } and its gradient is Lipschitz continuous. Note that one can 
write (D) in the form of in a number of different ways. One natural choice is of course to 
express it in the form of Q for q = A with (xi, X2, X3, X4) = {{Z, v), S, ue, yi)- Another possibility 
is to express it in the form of Q for g = 2 with (xi,X2) = {{Z,v), {S,yE,yi))- 

For the problem in Q, a well known technique for solving it is the block coordinate descent 
(BCD) method, for examples, see [101 ESI ESI ES] and references therein. Specifically, at iteration 
k, one may update the blocks successively in the Guass-Seidel fashion (other rules can also be 

applied, see my- 


fc-l-l 

= argmin3,^T(xi,X2, . 




^k+1 

= arg min^,, T(x^^"*"^,.. 

'r- A 

■ ) -''i —1 ) 1 • • 


(5) 

ryk+l 

•kq 

= arg min^,^ T,.. 

rpk + 1 \ 

■ ) Xq_i , Xq). 




When the subproblems in ([^ are not easily solvable, a popular approach is to use a single step of 
the proximal gradient (PG) method, thus yielding the block coordinate gradient descent (BGGD) 
method [33 El- 

Problem Q can also be solved by the accelerated proximal gradient (APG) method with 
iteration complexity of 0(1 /such as in HailBlIIIllIlEl. In the best case, BCD-type methods 
have an iteration complexity of 0{l/k) (see |27[ E]1. and can hardly be accelerated to 0{l/k'^) as in 
the case for the APG method. Nevertheless, some researchers have tried to tackle this difficulty 
from different aspects. Beck and Tetruashvili |2] proposed an accelerated BGGD method for 
solving Q by assuming that 0 = 0, i.e., without the nonsmooth terms. Very recently, Chambolle 
and Pock |5] presented an accelerated BCD method for solving Q by assuming that C{x) has 
the special form ^(a:) = Yli<i<j<q In theory, this method can be applied to the 

problem (D) by choosing xi = {Z,v) and X 2 = {S,yE,yi)- But the serious practical disadvantage 
is that the method in [5] does not cater for inexact solutions of the associated subproblems and 
hence it is not suitable for large scale problems since for (D) the subproblems must be solved 
numerically. 

Besides BCD-type methods based on deterministic updating order, there has been a wide 
interest in randomized BCD-type methods. Nesterov [20] presented a randomized BCD method 
with unconstrained and constrained versions in which the selection of the blocks is not done 
by a deterministic rule (such as the cyclic rule Q), but rather via a predescribed distribution. 
Furthermore, an accelerated 0(1/k'^) variant was studied for the unconstrained version. To extend 
the accelerated version for the more generic problem Q, Fercoq and Richtarik [9] proposed 
an accelerated 0(1/k'^) randomized block coordinate gradient (ARBCG) method. For strongly 
convex functions, Lin, Lu and Xiao [16] showed that a variant of this method can achieve a 
linear convergence rate. However, from our numerical experience, the ARBGG method usually 
can only solve (D) to an accuracy of since only the maximal eigenvalue of Ae-A!^ is 

used when updating yE (similarly for updating yj). Even a numerically much enhanced version 
of the ARBCG (denoted as eARBCG) method with a weighted norm (for which the theoretical 
convergence needs to be studied) is also typically 3-4 times slower than the accelerated block 
coordinate descent (ABCD) method with a special deterministic rule which we will propose later. 
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In this paper we aim to design an efficient inexact accelerated BCD-type method whose worst- 
case iteration complexity is 0{l/k‘^) for (D). We achieve this goal by first proposing an inexact 
accelerated block coordinate gradient descent (ABCGD) method for the general convex program¬ 
ming problem Q with 9s = ■ ■ ■ = 9q = 0, and then apply it to (D) to obtain an inexact ABCD 
method. Note that when is a convex quadratic function, the ABCGD method and the ABCD 
method are identical. Our ABCD method is designed based on three components. First, we ap¬ 
ply a Danskin-type theorem to eliminate the variable xi in Q. Then we adapt the inexact APG 
framework of Jiang, Sun and Toh proposed in m to solve the resulting reduced problem. By 
choosing an appropriate proximal term and adapting the recently developed inexact symmetric 
Gauss-Seidel decomposition technique m for solving a multi-block convex quadratic minimiza¬ 
tion problem (possibly with a single nonsmooth block), we show that each subproblem in the 
inexact APG method can be solved efficiently in a fashion almost like the symmetric Gauss-Seidel 
update. 

As already mentioned, one can also apply the APG method directly to solve (D), or more 
generally Q. In this paper, we also adapt the APG method to directly solve (D) for the sake 
of numerical comparison. In addition, since the ARBGG method does not perform well for 
solving (D), again for the sake of numerical comparison, we propose an enhanced version (called 
eARBGG) of an accelerated randomized block coordinate gradient method designed in |T6] for 
solving (D). As one can see later from the extensive numerical experiments conducted to evaluate 
the performance of various methods, though the BGD, APG and eARBGG methods are natural 
choices for solving (D), they are substantially less efficient than the ABGD method that we have 
designed. In particular, for solving (D), the ABGD method is at least ten times faster than the 
BGD method for vast majority of the tested problems. It is quite surprising that a simple novel 
acceleration step with a special BGD cycle, as in the case of the ABGD method, can improve the 
performance of the standard Gauss-Seidel BGD method by such a dramatic margin. 

The paper is organized as follows. In the next section, we introduce the key ingredients needed 
to design our proposed algorithm for solving Q, namely, a Danskin-type theorem for parametric 
optimization, the inexact symmetric Gauss-Seidel decomposition technique, and the inexact APG 
method. In Section we describe the integration of the three ingredients to design our inexact 
ABGGD method for solving (j^. Section [^presents some specializations of the introduced inexact 
ABGGD method to solve the dual LSSDP problem (D), as well as discussions on the numerical 
computations involved in solving the subproblems. In Sectionwe describe the direct application 
of the APG method for solving (D). In addition, we also propose an enhancement of the acceler¬ 
ated randomized block coordinate gradient method in |16j for solving (D). Extensive numerical 
experiments to evaluate the performance of the ABGD, APG, eARBGG and BGD methods are 
presented in Section Finally, we conclude the paper in the last section. 

Notation. For any given self-adjoint positive semidefinite operator T that maps a real Euclidean 
space X into itself, we use to denote the unique self-adjoint positive semidefinite operator 
such that = T and define 

IIxIIt- := (x, Tx) = Mx G X. 
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2 Preliminaries 


2.1 A Danskin-type theorem 

Here we shall present a Danskin-type theorem for parametric optimization problems. 

Let X and y be two hnite dimensional real Euclidean spaces each equipped with an inner 
product (•, •) and its induced norm || • || and (p : y ^ (—oo,-|-oo] be a lower semi-continuous 
function and D be a nonempty open set in X. Denote the effective domain of p by dom(<y9), which 
is assumed to be nonempty. Let (/)(•, ■) ■. y x X ^ (— 00 , -|-oo) be a continuous function. Dehne 
the function g : Q ^ [— 00 , -|-oo) by 

g{x) = inf {(^( 7 /) -h x)}, x G D. (6) 

y&y 

For any given x G D, let Xi{x) denote the solution set, possibly empty, to the optimization 
problem Q. The following theorem, largely due to Danskin [^, can be proven by essentially 
following the proof given in [HI Theorem 10.2.1]. 

Theorem 2.1. Suppose that p : y ^ {—oo, -|-oo] is a lower semi-continuous function and 4>{-, •) : 
T X A —)• (— 00 , -|-oo) is a continuous function. Assume that for every y G dom((^) / 0, (p{y, •) is 
differentiable on D and Vx4>{-, •) is continuous on dom((^) x D. Let x € LI be given. Suppose that 
there exists an open neighborhood M C LI of x such that for each x' G M, M.{x') is nonempty and 
that the set Ux'£jpM{x') is bounded. Then the following results hold: 

(i) The function g is directionally differentiable at x and for any given d £ X, 

gfx;d)= inf {Vx(f{y,x),d). 

y£M{x) 


(ii) If Xi{x) reduces to a singleton, say Xi{x) = {y{x)}, then g is Gateaux differentiable at x 
with 

Vg{x) = Vx(j){y{x),x). 


Danskin’s Theorem 2.1 will lead to the following results when convexities on p and f are 
imposed. 


Proposition 2.2. Suppose that p : y ^ (—oo,-|-oo] is a proper closed convex funetion, D is an 
open convex set of X and </>(•, •) : T x A —)■ (— 00 ,- 1 - 00 ) is a elosed convex function. Assume that 
for every y G dom((/7), 4>{y,-) is differentiable on D and Vxfi-,-) Is continuous on dom(<^) x D 
and that for every x' G LI, Al(x') is a singleton, denoted by {y{x')}. Then the following results 
hold: 


(i) Let X £ LI be given. If there exists an open neighborhood M f LI of x such that yf) is bounded 
on any nonempty compact subset of J\f, then the convex function g is differentiable on M 
with 

Vfl'(x') = Va,(/>(y(x'),x') Vx'gAA. 
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(ii) Suppose that there exists a convex open neighborhood M C O such that y{-) is bounded on any 

nonempty compact subset of M. Assume that for any y G dom((/?), Vx4>{y,-) Lipschitz 
continuous on M and that there exists a self-joint positive semidefinite linear operator E ^ 0 
such that for all x ^ M and y G dom((/?), 

where for any given y G dom((/?), x) is the generalized Hessian of (f{y, •) at x. Then 

0 < g{x') — g{x) — {Vg{x),x'— x) < ^{x'— x,T.{x'— x)) Vx'jXGA/". (7) 

Moreover, if N = Tt = X, then g{ ) is Lipschitz continuous on X with the Lipschitz 
constant ||E ||2 (the spectral norm ofT,) and for any x £ X, 

E^g VgG9Lff(x), (8) 

where df^dix) denotes the generalized Hessian of g at x. 

(iii) Assume that for every x £ Ll, 4>{-,x) is continuously differentiable on an open neighborhood 
containing dom((/9). Suppose that there exist two constants a > 0 and L > 0 such that for 
all x', X £ Ll, 

{Vy4>{y, x) - Vycfiy, x),y' - y) > a\\y - yf ^ y',y £ dom((^), 

and 

\\Vy(j){y,x') - Vy4){y,x)\\ < L\\x' - x|| Vy G dom(¥?). 

Then yf) is Lipschitz continuous on ri such that for all x',x £ Ll, 

hix') -yix)\\ < (L/a)||x'- x||. 


Proof. Part(i). The convexity of g follows from Rockafellar |24l Theorem 1] and the rest of part 
(i) follows from the above Danskin Theorem 2.1 and |231 Theorem 25.2], 

Part (ii). For any x', x £ M, we obtain from part (i) and the generalized mean value theorem 
(MVT) [121 Theorem 2.3] that 


y(x') -g{x) = 

< 


< 


(j){y{x'),x') + ifiyix')) - [4>{y{x),x) + (piy{x))] 

4>{y{x),x') + ip{y{x)) - [(/)(y(x),x) + ip{y{x))] 

4>{y{x),x') - 0(y(x),x) 

{^x4>{y{x),x), x' — x) + ^(x' — X, E(x' — x)) (by using the MVT) 
{Vg{x),x' — x) + -{x' — X, E(x' — x)) (by using Part (i)), 


which, together with the convexity of g, implies that Q holds. 

Assume that Af = Ll = X. By using 0 and [18t Theorem 2.1.5], we can assert that Vy is 
globally Lipschitz continuous with the Lipschitz constant llEjj 2 . So by Rademacher’s Theorem, 
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the Hessian of g exists almost everywhere in X. From Q, we can observe that for any x ^ X 
such that the Hessian of g' at x exists, it holds that 

^lx9{x) < S. 

Thus, Q follows from the definition of the generalized Hessian of g. 

Part (iii). The conclusions of part (iii) follow from part (i), the maximal monotonicity of 
dip{-), the assumptions and the fact that for every x' G H, 

0 G dip{y{x')) + Vy4>{y{x'),x). 


We omit the details here. 


□ 


2.2 An inexact block symmetric Ganss-Seidel iteration 

Let s > 2 be a given integer and X := Xi x X 2 x ... x Xg, where Aj’s are real finite dimensional 
Euclidean spaces. For any x £ X, we write x = (xi, X 2 ,..., Xg) with x* G X^. Let Q : A —)• A be a 
given self-adjoint positive semidefinite linear operator. Consider the following block decomposition 


Qx = 


( Qu 
Q 


12 


V 2 


Is 


Qi2 • • • Qls ^ 


( X\S 

Q 22 • • • Q2s 


X2 

0-23 ■ ■ ■ Q-ss ) 


\Xg ) 


/ 0 Qi2 


Ux = 



\ 


Xl ^ 
X2 


Qs-l,s 

0 / 


\Xg J 


where Qa : Aj —>■ Aj, i = l,...,s are self-adjoint positive semidefinite linear operators, Qij ; 
Xj —>■ Aj, i = 1,..., s — 1, j > i are linear maps. Note that Q = U* +V +U where Vx = 
(Qiixi,..., QggXg). 

Let r = (ri, r 2 ,..., r^) G A be given. Define the convex quadratic function /i : A —)• by 


h{x) := -(x, Qx) — (r, x), x G A. 


Let p : Ai —)• (— 00 , -t-oo] be a given lower semi-continuous proper convex function. 

Here, we further assume that Qa, i = 1,..., s are positive definite. Define 

x<i := (xi, X 2 ,..., Xi), x>i := (xj, Xj+i,..., x^), f = 0,..., s -|- 1 

with the convention that x<o = x>s+i = 0. 

Suppose that Si, 6^ G Aj, i = l,...,s are given error vectors, with = 0. Denote <5 = 
(5i,..., 5s) and h"'' = (hj *",... ,Sf). Define the following operator and vector: 

T := UV-^U*, (9) 

A(5,5+) := 6+ +UV-^{5+ -S). (10) 


Let y G A be given. Define 


X := argmm 

X 


|p(xi) -F h{x) + ^||x - y||^ - (A(5,5+), x)|. 


( 11 ) 
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The following proposition describing an equivalent BCD-type procedure for computing x"*", is 
the key ingredient for our subsequent algorithmic developments. The proposition is introduced by 
Li, Sun and Toh [TTj for the sake of making their Schur complement based alternating direction 
method of multipliers m more explicit. 

Proposition 2.3. Assume that the self-adjoint linear operators Qu, i = are positive 

definite. Let y £ X be given. For i = s,... ,2, define Xi G Xi by 


Xi := argmin{p(7/i)-h/i(?/<i_i,Xi,x>i+i) - (5*, Xi)} 

Xi 

— ~ Ylj=iQjiyj ~ J 2 j=i+iQij^j)■ ( 12 ) 

Then the optimal solution x+ defined by 0 can be obtained exactly via 
' x'l = {p{xi)+ h{xi,x> 2 ) - {5^, xi)}, 

< xf = aTguiin^. {p{xf)+ h{x^-_.^^,xi,x>i+i) - Xi)} (13) 

= + i = 2,...,s. 

Furthermore, FL := Q + T = {T> + U)V~^{V +W) is positive definite. 


For later purpose, we also state the following proposition. 

Proposition 2.4. Suppose that FL '.= Q + T = {V + U)V~^{V + U*) is positive definite. Let 
f = \\'H-^/^A{6,6+)\\. Then 

- 6) +V^/'^{V +U)-^S\\ < - S)\\ + \\n-^^^6\\. (14) 

Proof. The proof is straightforward by using the factorization of FL. □ 


Remark 1. In the computation in (12) and (13), we should interpret the solutions Xi, xj as 
approximate solutions to the minimization problems without the terms involving 6i and 5f. Once 
these approximate solutions have been computed, they would generate the error vectors di and 6f. 
With these known error vectors, we know that the computed approximate solutions are actually 
the exact solutions to the minimization problems in (12) and (13). 


2.3 An inexact APG method 

For more generality, we consider the following minimization problem 


min{F(x) := p{x) + /(x) | x G X}, 


(15) 


where A is a hnite-dimensional real Euclidean space. The functions / : A —)• 5?, p : A —)■ (—oo, oo] 
are proper, lower semi-continuous convex functions (possibly nonsmooth). We assume that / is 
continuously differentiable on A and its gradient V/ is Lipschitz continuous with modulus L on 
A, i.e., 

l|V/(x)-V/(y)|| <L||x-y|| Vx,?/GA. 


We also assume that problem (15) is solvable with an optimal solution x* G dom(p). The inexact 
APG algorithm proposed by Jiang, Sun and Toh m for solving (fi^ is described as follows. 
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Algorithm 1. Input (z dom(p), ti = 1. Set k = 1. Iterate the following steps. 

Step 1. Find an approximate minimizer 

x'^ « argminj^g;^.|/(/) + {Vf{y^), y - /) + ^{y - - /)) +p(y)}, ( 16 ) 

where Tik is a self-adjoint positive definite linear operator that is chosen by the user. 

Step 2. Compute and = x^ + 


Given any positive dehnite linear operator "Hj : X ^ A, define gj(-, •) : A x A —)• SR by 


= fiy) + x-y) + -{x- y, Hjix - y)). 

Let {cfc} be a given convergent sequence of nonnegative numbers such that 

E OO ^ 

k=i < oo. 

Suppose that for each j, we have an approximate minimizer: 

x^ Ri argmin{p(x) + qj{x^y^) \ x G A} 
that satisfies the following admissible conditions 

F{x^) < p{x^) + qj{x^,y^), 

^f{y^) + T-Lj{x^ - y^) + 7 -^ = <51 with < ejl{y/2tj), 


(17) 

(18) 
(19) 


where 7 -^ G dp{x^) (Note that for x^ to be an approximate minimizer, we must have x^ G dom(p).) 
Then the inexact APG algorithm described in Algorithm 1 has the following iteration complexity 
result. 


Theorem 2.5. Suppose that the conditions (18) and (19) hold and Tik-i F Jif. )>- 0 for all k. 
Then 


0 < F{x’‘)-F{x*) < 


X + Cfc)^ + , 


(A:+ 1)2 

where r = l(x° - x*, ?7i(x° - x*)), = Yl'j=i ^k = Z)j=i ^‘j- 

Proof. See m Theorem 2.1]. 


( 20 ) 


□ 


3 An inexact accelerated block coordinate gradient descent method 

We consider the problem 

min{F(xo,x) :=+(xo)+p(xi) + (/>(xo,x) I xq G Aq, x G A}, (21) 
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where x = {xi,...,Xs) G x ■ ■ ■ x Xs{=: X), ip : Xq ^ (— 00 , 00 ] and p : Xi —)> (— 00 , 00 ] 
and (j) : Xq X X ^ ^ are three closed proper convex functions and Xq, Xi,..., Xg are finite 
dimensional real Euclidean spaces. We assume that problem (21) is solvable with an optimal 
solution (xq, X*) G dom((^) x dom(p) x AI 2 x • • • x Xg. Define the function f : X ^ [— 00 , + 00 ) by 

/(x) = inf {(/?(xo) + (/)(xo,x)}, X = (xi,... ,x^) G df. (22) 

xq&Xq 


For any given x G X, let A4{x) denote the solution set to the optimization problem in (22). 
Suppose that the assumptions in Part ( ii)p] of Proposition 2.2 imposed on ip and (j) hold for 
Q. = X. Then, by Part (ii) of Proposition 2.2 we know that / is continuously differentiable and 
its gradient V/ is globally Lipschitz continuous, and for all x,y G X, 


f{x) - f{y) - {Vf{y),x -y) <^{x -y, C{x - y)), 


(23) 


where £ ; T —)• T is a self-adjoint positive semidefinite linear operator such that for any x G X, 

Chn ynGd^fix), (24) 

where 9^/(x) is the generalized Hessian of / at x. One natural choice, though not necessarily the 
best, for £ is £ = S, where S is a self-adjoint positive semidefinite linear operator that satisfies 


Ti'^% V'H G d‘^^(f){xo, x) V xo G dom(/?, x G X. 

Here for any given xq G domy?, d^^(j){xo,x) is the generalized Hessian of 4>{xo, •) at x. 
Now we consider an equivalent problem of (21): 

min{F(x) := p{xi) + /(x) | x G X}. 

Given a self-adjoint positive semidefinite linear operator Q : X ^ X such that 

Qh C, Qii>- 0, i = 

Define T and A by ([^ and (10), respectively, and h{-, ■) : X x X ^ ?R.hy 
Hx,y) ■■= f{y) + X-y)+ ^{x-y, Q{x-y)), 

= f{y) + {'^x4>ixo{y),y), X-y)+ ^{x-y, Q{x-y)), x,yGX, 


(25) 


(26) 


(27) 


(28) 


where xo(y) is the unique solution to infa;Qg;fQ{(^(xo) -|- (j){xo,y)}. From (23), we have 

f{x)<h{x,y) \/x,yGX. 

We can now apply the inexact APG method described in Algorithm 1 to problem ( |26[ ) to 
obtain the following inexact accelerated block coordinate gradient descent (ABCGD) algorithm 
for problem (21). 


^For subsequent discussions, we do not need the assumptions in Part (iii) of Proposition 2.2 though the results 
there have their own merits. 
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Algorithm 2. Input ^ dom(p) x A 2 x • • • x Xg, ti = 1. Let {efc} be a summable sequence 

of nonnegative numbers. Set k = 1. Iterate the following steps. 


Step 1. Suppose G Xi, i = 1,..., s, with 5i = 0, are error vectors such that 


max{||,5'=||,P'^||}<efc/(v/24). 


Compute 


Xq 


X = 


arg mm 

Xo 


in|(^(xo) + (/)(xo,y^)|, 


arg mm 

X 


in|p(xi) + /i(x,/) + -||x-/||^- x)|. 


(29) 


(30) 

(31) 


Step 2. Compute = x^ + (jjXi'j ^)- 


The iteration complexity result for the inexact ABCGD algorithm described in Algorithm 2 
follows from Theorem 2.5 without much difficulty. 

Theorem 3.1. DenotekL = Q + T and (3 = 2\\'D~^/‘^\\ + Let{{xQ,x^)} he the sequence 

generated by Algorithm 2. Then 


0 < F{x’‘)-F{x*) < 


T + + 2/3^, 


{k + lf 

where r = \{x^ — x*, TL{x^ — x*)), Ck = X)j=i = Yl^=i Consequently, 

4 


0 < F{x’^,x^) - F{xl,x*) < 


Proof. From (28), we have 


{k + iy 


T + /3efc) + 2/3 ik 


1 , 


.k\\2 


F{x^) < p{xt) + h{x\y^) + -||x" - /nr- 


From the optimality condition for (31), we can obtain 

V,c/>(x^, /) + nix^ - /) + 7*^ = A(5^ 

we know that 


where 7 ^ G 5p(x^). By using ([2^ and Proposition 


2.4 




< (2||p-V2|| + p-i/2||)max{||,5'=||,p' 


< 


/3efc 

V2tk 


(32) 


(33) 


(34) 


(35) 


Then (32) follows from Theorem 2.5 


□ 
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Remark 2. (a) Note that we can use the symmetric Gauss-Seidel iteration described in Proposition 


descent (BCGD) cycle with the order xq —>• Xs —)• Xg-i X 2 —)• xi —)• X 2 •••—)• 

(b) Assuming that Xq, x^, ..., X 2 have been computed. One may try to estimate X 2 , • • •, xj 

by using x^,. ■ ■ ,Xg, respectively. In this case the corresponding residual vector 5^ would be given 
by: 


2.3 to compute x^ in (31). Therefore, Step 1 is actually one special block coordinate gradient 


= + - yf), i = 2 ,..., s. (36) 

In practice, we may accept sueh an approximate solution xf = x^ for i = 2,...,s, provided 
a condition of the form ||(5f|| < c\\6^\\ (for some constant c > 1 say c = lOj is satisfied for 
i = 2,...,s. When such an approximation is admissible, then the linear systems involving Qu 
need only be solved once instead of twice for i = 2,...,s. We should emphasize that such a 
saving can be significant when the linear systems are solved by a Krylov iterative method such 
as the preconditioned conjugate gradient method. Of course, if the linear systems are solved by 
computing the Gholesky factorization (which is done only once at the start of the algorithm) of 
Qii, then the saving would not be as substantial. 


4 Two variants of the inexact ABCD method for (D) 


Now we can apply Algorithm 2 directly to (D) to obtain two variants of the inexact accelerated 
block coordinate descent method. In the first variant, we apply Algorithm 2 to (D) by expressing 
it in the form of 0 with q = A and (xi, X2, X3, X4) = {{Z,v), S,yE,yi). In the second variant, we 
express (D) in the form of (Q with q = 2 and (xi,X 2 ) = {{Z,v), {S,yE,yi)). For the remaining 
parts of this paper, we assume that Ae ■ W —>■ is onto and the solution set of (D) is 


nonempty. The convergence of both variants follows from Theorem 3.1 for Algorithm 2 and will 
not be repeated here. 


The detailed steps of the first variant of the inexact ABCD method are given as follows. 
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Algorithm ABCD-1: An inexact ABCD method for (D). 

Select an initial point {Z ^= {Z^,v^, ,y^,y^) with (——ti°) G dom((5p) x 

dom((5^). Let {e^} be a summable sequence of nonnegative numbers, and ti = 1. Set k = 1. 

Iterate the following steps. 

Step 1 . Suppose that dg, (5g G 7^™^ and 6j, 5j € are error vectors such that 

max{||4||,||{f||,||i|||,||if||)<et/(V24). 

Let B!^ = A*^y% + + 5^ + G. Compute 

{Z\v^) = argmin^,4F(Z,^;,5^y|,yf)} = (^p(i^")-.R^ - yf) - (^ - yf)), 

y% = aigimny^{F{Z^,v’^,S’^,yE,y^i) - {5%,yE)], 

fj = argmin^^{F(Z^u^5^yiy,)-(5,^y,)}, 

= argmin5{F(Z^u^5,y|,yf)} = + + Z" + G)), 

yf = argminj^^{F(Z'',u^,5^,y|,y7) - (<5f, y/)}, 

y| = aigTn.\nyj^{F{Z'^,v^,S^,yE,y'!) - {5%,yE)}. 

Step 2 . Set tk+i = = 4 ^- Compute 

5^+1 = 5"= + - S^-^), y^+i = y| + /3fc(y| - y^"'), y,"+' = yf + /Ifc(yf - y^'). 


Remark 3. (a) To compute y^ in Algorithm ABCD-1, we need to solve the following linear system 
of equations: 


(AeAe)])^ ~ bE ~ AEiAjijj F A Z^ + G) (37) 


with the residual norm 

ll*^£ll ~ II^E ~ AEiAjtjj A + Z^ A G) — AeAeTJ^W < efc/ (V^tk)- (38) 


If the (sparse) Cholesky factorization of AeA% can he computed (only once) at a moderate cost, 
then (37) can he solved exactly (i.e., = 0); otherwise (37) can he solved inexactly to satisfy 

(38) by an iterative method such as the preconditioned conjugate gradient (PCG) method. The 
same remark also applies to the computation of y^,y^,y^. 


(h) From the presentation in Step 1 of Algorithm ABCD-1, it appears that we would need to solve 
the linear systems involving the matrices AeA*^ and AiA) AT twice. In practice, one can often 
avoid solving the linear systems twice if y^ and y\ are already sufficiently accurate approximate 
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solutions for the respective second linear systems. More specifically, suppose that we approximate 
Ui by yf. Then the residual vector for the second linear system would be given by 

s’} = 6'} +Ai{S'^ -S’^). 

If the condition ||(5j|| < ekf{\/2tk) is satisfied, then we need not solve the second linear system 
sinee yj is already a suffieiently accurate solution for the second linear system. Similarly, if we 
use to approximate yE, then the corresponding residual vector would be given by 

4 = + AEiAKvi - Vi) + 3’^- s'^)- 

Again if the eondition that ||5g|| < ekf{V2tk) is satisfied, then we can take 4 = 4- 

For the second variant of the inexact ABCD method, we apply Algorithm 2 to (D) by ex¬ 
pressing it in the form of Q with q = 2 and (xi,X 2 ) = {{Z,v),{S,yE,yi)). In this case, we 
treat {S,yE,yi) as a single block and the corresponding subproblem in the ABCD method nei¬ 
ther admits an analytical solution nor is solvable via a linear system of equations. To solve the 
subproblem, we use a semismooth Newton-CG (SNCG) algorithm introduced in [371136] to solve 
it inexactly. 

The detailed steps of the second variant of the inexact ABGD method are given as follows. We 
should mention that it is in fact an accelerated version of a majorized semismooth Newton-GG 
(MSNGG) algorithm presented in [HH] . 


Algorithm ABCD-2: An inexact ABCD method with SNCG for (D). 

Select an initial point {Z^,y^,y}) = {Z^,v^, ,y%,yj) with (——u°) G dom(5.^) x 

dom(5^). Let {cfc} be a nonnegative summable sequence, = 1 and r = 10“®. Set k = 1. Iterate 
the following steps. 

Step 1. Suppose G 6j G are error vectors such that 

max{||5|||,||5f||}<e4(^^4). 

Let = A*^y’fi + AJijj + + G. Compute 

(Z^ v>^) = arg min {F(Z, v, S\y%, 4)} = {Ylp{R^) - R\ n^(y - 4) - {g - 4)), 

Z,v 

{S'',yE,yi) = aigmin (F{Z^,v^,S,yE,yi) + '^Wve - VEf “ (4: Ve) - (5f, y/)| . 
Step 2. Set tk+i = 

gk+l ~|+l ^ ^ _ yk-1^^ ~k+l ^ yk ^ 


In our numerical experiments, we always start with the first variant of the ABCD method, 
and then switch it to the second variant when the convergence speed of the first variant is deemed 
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to be unsatisfactory. As discussed in [371 [36], each iteration of the SNCG algorithm can be quite 
expensive. In fact, ABCD-1 can achieve a high accuracy efficiently for most of the problems 
perhaps because it has 0(l/fe^) iteration complexity and there is no need to be switched to 
ABCD-2. However, for some difficult problems, ABCD-1 may stagnate. In this case, ABCD-2 
can perform much better since it wisely makes use of second-order information and it has less 
blocks. 


4.1 An efficient iterative method for solving the linear systems 

Observe that in both the ABCD method and the APG method to be presented later in the next 
section, for solving (D), we need to solve the following linear systems 


Byi = r. 


(39) 


where B = {AiA*^ + oil)^ with a = 1 and a = ^ for the ABCD method and APG method, 
respectively. For the case where the matrix B and its (sparse) Cholesky factorization (need only 
to be done once) can be computed at a moderate cost, (39) can be solved efficiently. However, 


if the Cholesky factorization of B is not available, then we need an alternative efficient method 


to deal with (39) 
method. 


In this paper, we solve (39) by a preconditioned conjugate gradient (PCG) 
In order to speed up the convergence of the CG method, we construct the following 
preconditioner B based on a few leading eigen-pairs of B. Specifically, consider the following 
eigenvalue decomposition: 


B = PDP^ 


(40) 


where P G jg a,n orthogonal matrix whose columns are eigenvectors of B, and D is the 

corresponding diagonal matrix of eigenvalues with the diagonal elements arranged in a descending 
order: Ai > A 2 > • • • > Xmj- We choose the preconditioner to be 

kj mj 

B = Y,hPiPT + >^kj PiPT, (41) 

2=1 2 = /C/ + l 


where Pi is the i-th column of P, kj is a small integer such that 1 < A:/ <C mj and Xkj > 0. Note 
that we only need to compute (which only needs to be done once) the first ki eigenvalues of B 
and their corresponding eigenvectors. Then 


kj m/ kj kj 

S’* = ’Ek'p.P + k! E f’P = + 

2=1 2 = /C/ + l 


2=1 


2 = 1 


2=1 


From the expression of H, we can see that the overhead cost of applying the preconditioning step 
B~^v for a given v can be kept low compared to the cost of performing Bv. 

In our numerical experiments, we solve (39) approximately by applying the PGG method with 
the preconditioner (41) whenever the sparse Gholesky factorization of B is too expensive to be 
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computed. As we are solving a sequence of linear systems of the form (39) where the right-hand 
side vector changes moderately from iteration to iteration, we can warm-start the PCG method 
with the previous solution when solving the fcth linear system. For the problems which we 
have tested in our numerical experiments, where the number of linear inequality constraints mj in 
(P) can be very large, usually we only need less than ten PCG iterations on the average to solve 
( |39| ) to the required accuracy. This confirms that the preconditioner (41) is quite effective for the 
problems under our consideration, and we have thus presented an efficient iterative method to 
solve the large scale linear systems here. 


5 


An APG method and an enhanced ARBCG method for solving 

(D) 


Instead of the ABCD method, one can also apply the APG and accelerated randomized block 
coordinate gradient descent methods to solve (D). The details are given in the next two subsec¬ 
tions. 


5.1 An APG method for solving (D) 

To apply the APG method, we note that (D) can equivalently be rewritten as follows: 

(D) vain F{yE,yi,Z) ■.= -{hE, yE) + + \\\n.si{A*EyE + A*jyi + Z + G)\\^ 

m\9- y/f - ^ll(y - yi) - n^(y - yiW - • 

In order to apply the APG method to solve (D), we first derive a majorization of the objective 
function given the auxiliary iterate (yg, y^, Z^). Let N, Z) = 2 ||n 5 ^(M -\- N -\- Z + G)|p, 

^ 2 {yi) = \\\9 - yi\? - ^\\{9 - yi) - nK;(y - yiW- it is known that V(/?*(-), i = 1,2 are Lipschitz 
continuous. Thus we have that 


(pi{M,N,Z) < <fi{M'^,N^,Z'^) + {Usn{M'^ + N’^ + z’^ + G),M + N + Z-M^-N’^-Z’^) 

+^||M - M'^f + ^||A - + ^\\Z - Z^f, 

^2{yi) < ‘^2iyi)-{'nic{g-y^),yi-yi) + ^\\yi-yif, 
where M = A*EyE, N = AJyi, = A*Ey^, = AJy^. From here, we get 


< 


where 


F{yE,yi,Z)-F{y>k,ylZ>^) 

6U-Z) - {bE, yE - y%) + (A^ A^eVe - A*Ey%) + (A^ A*jyi - AJy^) + (A^ Z 
— yi — yi) + 2 + ^^lAJyi — A}yj\f -i- -||Z — -i- -\\yi 

(t>'^{yE,yi,z), 

{A*EfE + A*ifi + z" + G), s" = nyc(5 - fi). 
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At each iteration of the APG method [U [Mj applied to (D), we need to solve the following 
minimization subproblem at the /cth iteration: 

whose optimal solution is given by 

' y% = {AEA*E)-W{hE-AX^)+AEA*EfE), 

< y^, = {AiA} + \Xm,)-W{s^-AX^ + y^,)+AiA}y'l), (42) 

= Tlv{\X^ - Z^) - HX^ - Z^). 

With the computed iterate {y^,y^, Z^), we update the auxiliary iterate {y’^^, yj^^, simi¬ 

larly as in Step 2 of Algorithm ABCD-1. 

5.2 An enhanced ARBCG method for solving (D) 

Next we describe the enhanced accelerated randomized block coordinate gradient (ARBCG) 
method for solving (D). Our algorithm is modified from Algorithm 3 in m for the sake of 
numerical comparison, and the steps are given as follows. 


Algorithm eARBCG: An enhanced ARBCG method for (D). 

We use the notation in Q with q = 4 and x = {{Z, v), S, yE, yi)- Select an initial point = x^. 
Set ao = Ijq and k = 1. Iterate the following steps. 

Step 1. Compute Uk = \ cat-i + “ “fc-i) ■ 

Step 2. Compute x^^^ = (1 — ctk)x^ + akX^. 

Step 3. Choose ifc G {1,..., g} uniformly at random and compute 


x-^ = argmin 


in{(V4C(3 


^||™. _ ~k l |2 


h ^ik 


iATi. + 


where 71, Ti are identity operators, and Ta = AeA*^ and 71 = AiA*i +X. Here 9i{xi) = 
5p{—Z) + (5^(—u), 92{x2) = Ss^{S), 03 (x 3 ) = 0 = ^ 4 (^ 4 ), and C is the smooth part of F in 
(D). Set x^~^^ = x\ for all i ^ ik and 


.k+i — 

i 


x’l + qakix’l^^ - x^) if i = 4, 
Xi if i A 4- 


Note that in the original ARBCG algorithm in [16], the linear operators Ts and 71 are fixed 
to be Ts = Amax(A_EAg)X and 71 = Amax(A/Aj +X)X. Our enhancement to the algorithm is in 
using the operators Ts = AeA*^ and 71 = AiA*i +X. Indeed the practical performance of the 
eARBCG with the latter choices of 71 and 71 is much better than the former more conservative 
choices. However, we should note that although the non-convergence of the eARBCG method is 
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never observed for the problems tested in our numerical experiments, the theoretical convergence 
of the eARBCG has yet to be established, for which we leave as a future research topic. 


6 Numerical experiments 

In our numerical experiments, we test the algorithms designed in the last two sections to the least 
squares semidefinite programming problem (P) by taking G = —C,g = 0 for the data arising 
from various classes of SDP problems of the form given in Q. Specifically, the LSSDP problem 
corresponds to the first subproblem Q of the PPA for solving Q by setting k = 0, = 0, 

= 0 and cto = 1 . 


6.1 SDP problem sets 

Now we describe the classes of SDP problems we considered in our numerical experiments. 


(i) SDP problems coming from the relaxation of a binary integer nonconvex quadratic (BIQ) 
programming: 

min I ^x^Qx + {c, x) | x G {0,1}"'“^ |. (43) 

This problem has been shown in [3] that under some mild assumptions, it can equivalently be 
reformulated as the following completely positive programming (CPP) problem: 

min{^(Q, Xq) + (c, x) \ diag(Xo) = x,X = [Ao,x;x^, 1] G C^pj, (44) 


where Cpp denotes the n-dimensional completely positive cone. It is well known that even though 
Cpp is convex, it is computationally intractable. To solve the CPP problem, one can relax Cpp to 
5” n M, and the relaxed problem has the form 0: 


min ^{Q, Xo) + {c, x) 


s.t. diag(Alo) — x = 0, a = 1, X = 


Xo X 
T 

X a 


G 5 ”, X G P, 


(45) 


where the polyhedral cone V = {X G | X > 0}. In our numerical experiments, the test 
data for Q and c are taken from Biq Mac Library maintained by Wiegele, which is available at 
http://biqmac.uni-klu.ac.at/biqmaclib.html. 


(ii) SDP problems arising from the relaxation of maximum stable set problems. Given a graph 
G with edge set T, the SDP relaxation 0+(G') of the maximum stable set problem is given by 

e+{G) = max{(ee'^, X) | X) = 0, (i, j) G T, (I, X) = I, X G 5", X G P}, (46) 

where Eij = CjcJ + ejej and denotes the ith column of the identity matrix, P = {X G 5” | 
X > 0}. In our numerical experiments, we test the graph instances G considered in [29], |30|, and 

ISH. 
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(iii) SDP relaxation for computing lower bounds for quadratic assignment problems (QAPs). 
Given matrices A,B€ 5"', the QAP is given by := m.\ii{{X,AXB) : X G 77} where 77 is 
the set of n X n permutation matrices. 

For a matrix X = [rci,..., Xn] G we will identify it with the n^-vector x = [xi;...; Xn]- 

For a matrix Y G 77" , we let be the nxn block corresponding to Xix'J in the matrix xx'^. 

In [22], it is shown that Uq^p is bounded below by the following number generated from the SDP 
relaxation: 


V := min (77 ® A, Y) 

s.t. J27=ly" = I,{I,Y^^) = 6^j yi<i<j<n, 

{E, = 1 yi<i<j <n, Y £Sl, Y £V, 


(47) 


where the sign “ 0 ” stands for the Kronecker product, E is the matrix of ones, and 5ij = 1 if 
i = j, and 0 otherwise, V = {X G 5" \ X > 0}. In our numerical experiments, the test instances 
{A, 77) are taken from the QAP Library |11] . 


(iv) SDP relaxations of clustering problems (RCPs) described in eq. (13)]: 


mm 


{(-IF, X) \Xe = e, (7, X) = 77, A G A G T’}, 


(48) 


where IF is the so-called affinity matrix whose entries represent the pairwise similarities of the 
objects in the dataset, e is the vector of ones, and K is the number of clusters, V = {X G 5" | 
A > 0}. All the data sets we tested are from the UCI Machine Learning Repository (available 
at http://archive.ics.uci.edu/ml/datasets.html). For some large data instances, we only 
select the first n rows. For example, the original data instance “spambase” has 4601 rows, we 
select the first 1500 rows to obtain the test problem “spambase-large. 2” for which the number 
“2” means that there are K = 2 clusters. 


(v) SDP problems arising from the SDP relaxation of frequency assignment problems (FAPs) 
[Tj. Given a network represented by a graph G with edge set S and an edge-weight matrix IF, a 
certain type of frequency assignment problem on G can be relaxed into the following SDP (see 
H eq. (5)]): 


max ((^)L(IF) - iDiag(IFe), A) 
s.t. diag(A) = e, A G X £V, 


(49) 


where 7(IF) := Diag(IFe) — IF is the Laplacian matrix, e G 37" is the vector of all ones, and 
R = {A G 5" I L < A < 7/}, 


Lij — 


V(i,j)G77, 

—oo otherwise, 1 oo otherwise 


with k > 1 being a given integer and 7/ is a given subset of £. 
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(vi) For the SDP problems described in (45) arising from relaxing the BIQ problems, in order 
to get tighter bounds, we may add in some valid inequalities to get the following problems: 


min ^(Q, Y) + (c, x) 

s.t. diag(y) — X = 0, a = 1, X = 


Y X 

T 

X a 


€S!t, X e V, 


0 < —Yij + Xi < 1, 0 < —Yij + Xj <1, —1 < Yij — Xi — Xj < 0, 

yi<i<j, j<n-l, 


(50) 


where V = {X G \ X >0}. For convenience, we call the problem in (50) an extended BIQ 
problem. Note that the last set of inequalities in 


are obtained from the valid inequalities 
0 < Xi{l — Xj) < 1, 0 < Xj{l — Xj) < 1, 0 < (1 — Xi)(l — Xj) < 1 when Xj, Xj are binary variables. 


6.2 Numerical results 

In this section, we compare the performance of the ABCD, APG, eARBCG and BCD methods 
for solving the LSSDP (P). Note that the BCD method follows the template described in (|^ with 
g = 4 and x = ((Z, v), S, ue, Vi)- All our computational results are obtained by running Matlab 
on a workstation (20-core, Intel Xeon E5-2670 v2 @ 2.5GHz, 64GB RAM). 

In our numerical experiments, for problem (P), we assume that 7 := max{||G||, Hfl'H} < 1, 
otherwise we can solve an equivalent rescaled problem 

.p. min \\\X - Gf + \\\s - gf 

s.t. Ae{X) = Ie, AiX -s = 0, X g si, X gP, sgX, 

where G = ^, g = ^, Be = P = {X \ 7 A G V}, X = {s | 7 s G JC}. Note that {X, s) is a 
solution to (P) if and only if is a solution to (P). 

Note that under a suitable Slater’s condition, the KKT conditions for (P) and (D) are given 
as follows: 


AeX = bE, AjX — s = 0, X — Y = 0, 
X = Usn{A*EyE + A*jyi + Z + G), 

Y = Ur{A),yE + A*jyi + S + G), 
s = Ufc{g-yi). 


Thus we measure the accuracy of an approximate optimal solution {Z,v, S,yE,yi) for (D) by 
using the following relative residual: 


7? = max{??i,ri2,%}, (52) 

where r/i = ^ = ^s^iA^yE + A*jyi + Z + G), Y = 

Il-p{A*^yE + A*jyi -I- S' -I- G), s = Hjcig — yi)- Additionally, we compute the relative gap defined 
by 


^ ^MX-G\\^m\s-g\\^-F(Z,v,S,VE,yi) 

im\X-Gp+'^\\s-g\\^ + \F{Z,v,S,yE,yi)\' 


(53) 
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Let e > 0 be a given accuracy tolerance. We terminate the ABCD, APG, eARBCG and BGD 
methods when rj < e. 

Table shows the number of problems that have been successfully solved to the accuracy of 
10“® in r] by each of the four solvers: ABGD, APG, eARBGG and BGD, with the maximum 
number of iterations set at 2500C|^ As can be seen, only ABCD can solve all the problems to 
the desired accuracy of 10“®. The performance of the BCD method is especially poor, and it can 
only solve 201 problems out of 616 to the desired accuracy. 

Table shows the number of problems that have been successfully solved to the accuracy of 
10“®, 10“^ and 10“^ in rj by the solver ABCD, with the maximum number of iterations set at 
25000. As can be seen, ABCD can even solve almost all the problems to the high accuracy of 
10 “®. 


Table 1: Number of problems which are solved to the accuracy of 10 ® in rj. 


problem set (No.) \ solver 

ABCD 

APG 

eARBCG 

BCD 

0+ (64) 

64 

64 

64 

11 

FAP ( 7) 

7 

7 

7 

7 

QAP (95) 

95 

95 

24 

0 

BIQ (165) 

165 

165 

165 

65 

RCP (120) 

120 

120 

120 

108 

exBIQ (165) 

165 

141 

165 

10 

Total (616) 

616 

592 

545 

201 


Table 2: Number of problems which are solved to the accuracy of 10 10 ^ and 10 in rj by 

the ABCD method. _ 


problem set (No.) \ e 

10 -^^ 

10 -’^ 

io-« 

0+ (64) 

64 

58 

52 

FAP ( 7) 

7 

7 

7 

QAP (95) 

95 

95 

95 

BIQ (165) 

165 

165 

165 

RCP (120) 

120 

120 

118 

exBIQ (165) 

165 

165 

165 

Total (616) 

616 

610 

602 


Table [^compares the performance of the solvers ABCD, APG and eARBCG on a subset of the 
616 tested LSSDP problems using the tolerance e = 10“®. The full table for all the 616 problems is 
available at http://www.math.nus.edu.sg/~niattohkc/publist.htnil, The first three columns 
of Table give the problem name, the dimensions of the variables ue {'itie) and yj (mj), the size 
of the matrix C {us) in (P), respectively. The number of iteration^ the relative residual rj and 

^The maximum number of iterations for eARBCG is 25000g, where q is the number of block variables, i.e., g = 4 
for extended BIQ problems and g = 3 otherwise, since eARBCG only updates one block variable at each iteration. 

^Note that the two numbers of iterations in ABCD are for ABCD-2 and ABCD-1, respectively. 
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relative gap r]g, as well as computation times (in the format hours:minutes:seconds) are listed in 
the last twelve columns. As can be seen, ABCD is much faster than APG and eARBCG for most 
of the problems. 

Figure[^ shows the performance profiles of the ABCD, APG, eARBCG and BCD methods for 
all the 616 tested problems. We recall that a point (x, y) is in the performance profile curve of 
a method if and only if it can solve exactly (100y)% of all the tested problems at most x times 
slower than any other methods. It can be seen that ABCD outperforms the other 3 methods by a 
significant margin. Furthermore, the ABCD method is more than ten times faster than the BCD 
method for vast majority of the problems. It is quite surprising that a simple novel acceleration 
step with a special BCD cycle can improve the performance of the standard Gauss-Seidel BCD 
method by such a dramatic margin. 


Performance Profile {64 0^, 7 FAP, 95 QAP, 165 BIQ, 120 RCP, 165exBIQ problems) tol = 1e-06 



Figure 1: Performance profiles of ABCD, APG, eARBCG and BCD on [1,10]. 


Figure shows the tolerance profiles of the ABCD method for all the 616 tested problems. 
Note that a point (x, y) is in the tolerance prohle curve if and only if it can solve exactly (100?/)% 
of all the tested problems at most x times slower than the time taken to reach the tolerance of 
10 -®. 

7 Conclusions 

We have designed an inexact accelerated block coordinate gradient descent (ABCGD) method 
for solving a multi-block convex minimization problem whose objective is the sum of a coupled 
smooth function with Lipschitz continuous gradient and a separable (possibly nonsmooth) function 
involving only the first two blocks. An important class of problems with the specified structure 
is the dual of least squares semidefinite programming (LSSDP) where the primal matrix variable 
must satisfy given linear equality and inequality constraints, and must also lie in the intersection 
of the cone of positive semidefinite matrices and a simple polyhedral set. 

Our inexact ABCGD method has 0(1 /iteration complexity if the subproblems are solved 
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Tolerance Profile (64 0 ,7 FAP, 95 QAP, 165 BIQ, 120 RCP, 165 exBIQ problems) 



Figure 2: Tolerance profiles of ABCD on [1,10]. 


progressively more accurately. The design of our ABCGD method relied on recent advances in 
the symmetric Gauss-Seidel technique for solving a convex minimization problem whose objective 
is the sum of a multi-block quadratic function and a non-smooth function involving only the first 
block. Extensive numerical experiments on various classes of over 600 large scale LSSDP problems 
demonstrate that our ABCGD method, which reduces to the ABCD method for LSSDP problems, 
not only can solve the problems to high accuracy, but it is also far more efficient than (a) the 
well known BCD method, (b) the eARBCG (an enhanced version of the accelerated randomized 
block coordinate gradient) method, and (c) the APG (accelerated proximal gradient) method. 
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Table 3: Performance of ABCD, APG and eARBCG on 0_|_, FAP, QAP, BIQ, RCP and 
extended BIQ problems {s = 10~®) 
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Table 3: Performance of ABCD, APG and eARBCG on 0_|_, FAP, QAP, BIQ, RCP and 
extended BIQ problems {s = 10~®) 
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Table 3: Performance of ABCD, APG and eARBCG on 0_|_, FAP, QAP, BIQ, RCP and 
extended BIQ problems {s = 10~®) 
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Table 3: Performance of ABCD, APG and eARBCG on 0_|_, FAP, QAP, BIQ, RCP and 
extended BIQ problems {s = 10~®) 
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